The Hospital data can be obtained directly from the web or downloaded and then read from a local drive. Both methods are presented below.
In the code block below, the data are read from the web. The URL for the CSV file is copied and pasted from the web page. A header is then requested to be sure that the data are properly read.
hosp1 <- read.csv("http://facweb1.redlands.edu/fac/jim_bentley/Data/MATH%20111%20Examples/Hospitals/hospitals.csv")
head(hosp1)
## hospital condition survival
## 1 A Good Survived
## 2 A Good Survived
## 3 A Good Survived
## 4 A Good Survived
## 5 A Good Survived
## 6 A Good Survived
Reading data from a local drive is done in a similar manner. Below it is assumed that the CSV file is stored in the RStudio project folder.
hosp2 <- read.csv("hospitals.csv")
head(hosp2)
## hospital condition survival
## 1 A Good Survived
## 2 A Good Survived
## 3 A Good Survived
## 4 A Good Survived
## 5 A Good Survived
## 6 A Good Survived
Tables for a single categorical variable are easy to produce using R. A frequency table for the hospital variable in the hosp1 data frame is produced below.
table(hosp1$hospital)
##
## A B
## 2100 800
We note that 2100 people went to hospital A and 800 went to hospital B.
To get the proportions and percentages of cases in each category we use the prop.table function.
prop.table(table(hosp1$hospital))
##
## A B
## 0.7241379 0.2758621
print(prop.table(table(hosp1$hospital)), digits=4)
##
## A B
## 0.7241 0.2759
prop.table(table(hosp1$hospital))*100
##
## A B
## 72.41379 27.58621
print(100*prop.table(table(hosp1$hospital)), digits=4)
##
## A B
## 72.41 27.59
The use of the digits option within the print function makes it easy to control how “pretty” the output is.
Similar tables can be computed for the survival variable:
table(hosp2$survival)
##
## Died Survived
## 79 2821
prop.table(table(hosp2$survival))
##
## Died Survived
## 0.02724138 0.97275862
Creating plots is just as easy as generating tables. First note that hospital is a categorical variable. R calls these “factor” variables and will not do arithmetic on them. R’s plot functions are also smart in how they treat categorical data.
We can create a bargraph in a number of ways. Two quick ones are presented below.
### Using base plots
barplot(table(hosp2$hospital))
### Using lattice graphics
p_load(lattice)
histogram(~hospital, data=hosp1)
histogram(~hospital, data=hosp1, type="count")
histogram(~hospital, data=hosp1, type="density")
Note that the option type allows us to change from percentages to counts to density or proportion.
While R will create pie charts, we know that they should be avoided because they are harder to intepret than bar charts. If you must make them, the pie function works.
pie(table(hosp1$hospital), col=c("skyblue","pink"))
Two-way, or RxC, frequency tables are equally easy to construct in R. We will use the hospital data from above to look at how they may be generated.
table(hosp1$survival,hosp1$hospital)
##
## A B
## Died 63 16
## Survived 2037 784
Conversion to table, row, and column proportions and percentages are carried out using prop.table.
### Table values
prop.table(table(hosp1$survival,hosp1$hospital))
##
## A B
## Died 0.021724138 0.005517241
## Survived 0.702413793 0.270344828
print(100*prop.table(table(hosp1$survival,hosp1$hospital)), digits=4)
##
## A B
## Died 2.1724 0.5517
## Survived 70.2414 27.0345
### Row values
prop.table(table(hosp1$survival,hosp1$hospital),1)
##
## A B
## Died 0.7974684 0.2025316
## Survived 0.7220844 0.2779156
print(100*prop.table(table(hosp1$survival,hosp1$hospital),1), digits=4)
##
## A B
## Died 79.75 20.25
## Survived 72.21 27.79
### Column values
prop.table(table(hosp1$survival,hosp1$hospital),2)
##
## A B
## Died 0.03 0.02
## Survived 0.97 0.98
print(100*prop.table(table(hosp1$survival,hosp1$hospital),2), digits=4)
##
## A B
## Died 3 2
## Survived 97 98
Statkey generates stacked barcharts. We can generate counts, proportion, and percentage based plots.
### Store the frequencies so that the code is easier to read
freq <- table(hosp1$survival,hosp1$hospital)
### Modify the margics to make room for the legend
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
### Make the plot and add the legend
barplot(freq, col=heat.colors(length(rownames(freq))), width=2, ylab = "Count", xlab = "Hospital")
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(freq))), legend=rownames(freq))
### Store the proportions so that the code is easier to read
prop <- prop.table(table(hosp1$survival,hosp1$hospital),2)
### Modify the margics to make room for the legend
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
### Make the plot and add the legend
barplot(prop, col=heat.colors(length(rownames(prop))), width=2, ylab = "Proportion", xlab = "Hospital")
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(prop))), legend=rownames(prop))
### Store the percentages so that the code is easier to read
perc <- 100*prop.table(table(hosp1$survival,hosp1$hospital),2)
### Modify the margics to make room for the legend
par(mar=c(5.1, 4.1, 4.1, 7.1), xpd=TRUE)
### Make the plot and add the legend
barplot(perc, col=heat.colors(length(rownames(perc))), width=2, ylab = "Percentage", xlab = "Hospital")
legend("topright",inset=c(-0.25,0), fill=heat.colors(length(rownames(perc))), legend=rownames(perc))
Often it is easier to read side-by-side or lattice barplots.
### Load the lattice library
p_load(lattice)
### Barplot of survival frequencies by hospital
histogram(~survival|hospital, data=hosp1, type="count")
### Barplot of survival proportion by hospital
histogram(~survival|hospital, data=hosp1, type="density", ylab = "Proportion")
### Barplot of survival proportion by hospital
histogram(~survival|hospital, data=hosp1, type="percent")
Less informative in this situation are the plots corresponding to conditioning on rows or survival. Note that the difference between these plots and those above are just the switching of the variables.
### Load the lattice library
p_load(lattice)
### Barplot of hospital frequencies by survival
histogram(~hospital|survival, data=hosp1, type="count")
### Barplot of hospital proportions by survival
histogram(~hospital|survival, data=hosp1, type="density", ylab = "Proportion")
### Barplot of hospital percentages by survival
histogram(~hospital|survival, data=hosp1, type="percent")